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abstract 

Numerically  intensive  calculations  are  not .^[ET'we'i Jnfify’  Sme'adtSoS'built-in dedicates  and 

££££.  VSSS. *ST 4  on  the  Prolog  version  of  some  Whetstone  benchmarlts  (in 

double  precision). 

1.  Introduction 

Contemporary  Prolog  execution  systems  provide  excellent  support  for  symbolic  calculations,  but  are  gen- 
entlly  quite  weak  in  their  support  of  numenc  and  linear  algebra  calculations.  Ye,  some  of  the  most  interesting  and 
challenging  applications  of  logic  programming  require  high  performance  execuuon  of  ughtly  coupled  symbolic  and 
numeric  calculations.  Examples  include  computer-aided  design/engineering/manufacuring.  sensor  fusion,  roboucs. 
constraint  logic  programming,  geometric  modeling  and  reasoning  with  probabilistic  evidence. 

In  our  Aquarius  project  [61,  one  of  the  main  applications  is  design  automation  [3]  and  it  requires  extensive 
numeric  calculations  as  well  as  symbolic  manipulations.  We  are  investigating  additional  built-in  predrca.es  and 
macros  for  the  Prolog  language  to  better  support  numeric  operations.  The  predicates  have  a  semanuc  mtetpretatron 
in  a  kernel  subset  of  Prolog,  but  can  be  efficiently  and  directly  compiled  into  powerful  machine  mstnrcuons.  A, 
execution  time,  most  of  the  machine  instructions  are  executed  by  a  symbolic  processor,  the  PLM  18. 9).  When  the 
special  numeric  instrucfions  are  fetched  by  a  pre-fetch  unit,  they  are  ignored  by  the  symbolic  processor  and  are 
acted  upon  by  the  Aquarius  Numeric  Processor  (ANP)  [15]. 

The  ANP  is  a  high  performance  vector  numeric  processor  especially  designed  to  support  numeric  operauons 
plat  occur  in  the  context  of  logic  programming.  Figure  1  shows  a  block  diagram  of  this  integrated  ANP/PLM  archi- 
lecture.  The  ANP  coprocessor  of  figure  1  is  currently  under  construction  using  TTL  and  ECL  parts  and  will  be 

inserted  into  our  current  experimental  system1  in  the  near  future. 

We  are  snuggling  with  the  many  conflicting  issues  that  develop  when  all  the  complexifies  of  logic  program- 
ming.  floating  point  calculattons,  and  linear  algebra  interact  with  the  problems  of  exceptions,  aide-effects,  efficiency 
of  execution  and  'beauty  of  language  express, on.  It  is  our  desire  not  to  funher  burden  the  semantics  of  Prolog 
with  any  additional  non-logical  complications,  but  at  the  same  time  we  must  provide  for  efficient  numeric 

■  !  "  v  ,  - _ ,  ,  y  1  [71  co-orocessor  with  a  Sun  3/160  host  The  X-l  ii  an  improved,  commercial 

‘Our  current  experimental  tyttem  it  a  Xenologic  model  X-l  l/J  co-processor  wiut 

version  of  the  PLM. 
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Figure  1:  ANP/PLM  System  Block  Diagram 


calculations  if  the  Aquarius  system  is  to  be  useful  for  our  applications.  In  the  sections  below  we  explain  our  current 
choices  and  compromises.  We  fully  expect  that  our  system  will  evolve  as  we  discover  and  solve  problems  and  gain 
experience  in  debugging  and  analyzing  the  new  ANP/PLM  System. 

2.  An  Example  of  a  Numeric  Program 

The  numerically  intensive  calculations  in  science  and  engineering  that  are  not  well  supported  by  Prolog 

include  the  heavy  use  of  floating  point,  destructive  assignment,  arrays,  and  iteration  (loops).  A  simple  example 

from  linear  algebra  (the  solution  of  tridiagonal  systems  [12])  illustrates  how  we  intend  to  address  these  points.  The 

original  Fortran  code  (slightly  modified  for  illustration  purposes)  for  this  calculation  is. 

SUBROUTINE  TRIDAG (A,  B,  C,  R,  U,  N) 

PARAMETER  (NMAX=100) 

DIMENSION  G(NMAX),  A (N) ,  B (N) ,  C(N),  R(N),  U (N) 

H  =  1 

IF  (B ( 1 )  .EQ.  0.)  PAUSE 
E  -  B  { 1 ) 

U  ( 1 )  =  R  ( 1 )  /  E 
DO  11  J  -  2,  N 

G  ( J)  =  C(J-l)  /  E 
E  -  B  (J)  -  A  ( J)  *  G  ( J) 

IF  (E  .EQ.  0.)  PAUSE 

U(J)  -  (R ( J)  -  A ( J)  *  U(J-l))  /  E 

11  CONTINUE 

DO  12  J  =  N-l,  1,  -1 

U ( J)  -  <U(J)  -  G  ( J+l )  *  U ( J+l) )  *  H 

12  CONTINUE 
RETURN 
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obtains  the  following  executable  code. 


END 

When  this  code  is  direcdy  translated  into  Prolog,  one 

tridag  (A,  B,  C,  R,  NNNU,  N) 

H  is  1, 
new_array(G) , 
new_array (U) , 
aref  (1,  B,  E) , 
aref (1,  R,  Rl) » 
test_bet (E) , 

U1  is  R1  /  E, 
aset(l,  U,  Ul,  NU) , 
xtridag(2,  N,  A,  B,  C,  R,  NU,  G 

E,  NNU,  NG) , 

N1  is  N-l, 

ytridag (Nl,  NNU,  NG,  NNNU,  H) . 

ytridag(J,  U,  _,  U,  _)  J  <  1. 
ytridag  (J,  U,  G,  NNU,  H)  J  ">“  ' 

J1  is  J+l, 
aref ( J,  U,  UJ) , 
aref ( Jl,  U,  UJ1) , 
aref ( Jl ,  G,  Jl ) , 

NUJ  is  (UJ  -  Jl  *  UJ1)  *  H, 
aset(J,  U,  NUJ,  NU) , 

NJ  is  J-l, 

ytridag (NJ,  NU,  G,  NNU,  H) . 


test_bet (E) 

E  0.0,  !, 

write ('E  is  zero  in  tridag'),  nl, 
trace,  fail. 
test_bet (_)  . 

xtridag(J,  N,  ,  _,  _•  _< 

_,  U,  G)  J  >  N. 

xtridag(J,  N,  A,  B,  C,  R,  U,  G, 

E,  H,  NNU,  NNG)  J  *<  N, 

Jl  is  J-l, 
aref ( J ,  A,  AJ) , 
aref ( J,  B,  BJ) , 
aref (Jl,  C,  CJ1) , 
aref ( J,  R,  RJ> * 
aref(Jl,  U,  UJ1) , 

GJ  is  CJ1  /  E, 
aset (J,  G,  G J ,  NG)  , 

NE  is  BJ  -  AJ  *  GJ, 
test_bet (NE) , 

UJ  is  ( (RJ  -  AJ  *  UJ1)  /  NE)  *  H, 
aset(J,  U,  UJ,  NU) , 

NJ  is  J+l, 

xtridag(NJ,  N,  A,  B,  C,  R,  NU, 

NG,  NE,  NNU,  NNG) . 


The  predicates  aref,  as.t.and  new.array  are  library  routines  for  the  support  of  extendible  mays  [13).  ta 
to  implementation  turays  am  represented  by  balanced  4.way  trees.  Thus,  rhe  access  time  is  logarithmic  tn  the  sue 
of  the  turay.  Although  this  time  may  not  seem  large,  an  1000  element  array  could  require  more  than  20  words  to  be 
written  for  the  modification  of  a  single  element.  This  overhead  in  both  time  and  memory  is  not  tolerable  in  high 
performance  systems.  Fo,  these  reasons,  we  have  introduced  new  predicams  fo,  destntcive  (ye.  backtrackable) 

assignment. 

It  is  apparent  that  the  readability  of  the  Prolog  code  is  much  less  than  that  of  the  Fortran  code.  Much  of  the 
clutter  is  caused  by  not  being  able  to  compute  array  elements  directly  in  the  assignment  (is )  statements.  Tlus  prob¬ 
lem  is  easily  corrected  by  the  use  of  a  macro  (called  *«)  which  allows  user  defined  functions  to  be  placed  in 
assignment  statements.  We  use  square  brackets  to  denote  array  indexing.  This  does  not  conflict  with  list  notation 

because  they  always  follow  a  variable. 

The  Fortran  DO  loop  must  be  replaced  with  recursion  in  Prolog.  This  in  itself  is  fine,  but  often  a  large 
number  of  active  variables  must  be  passed  to  the  recursive  predicate.  Lengthy  argument  lists  introduce  unnecessary 
tedium  for  the  programmer  and  sources  of  error.  The  use  of  an  iteration  macro  (called  do)  will  alleviate  this  prob¬ 
lem  The  two  macros  -  and  do  are  described  in  the  next  section.  The  Prolog  code  can  be  rewritten  as  follows. 
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tridag (A,  B,  C,  Rr  U) 

H  is  1, 

E [ 1 ]  *«  B [ 1 ] ,  %  boundary 

V [ 1 ]  *“  R[l]  /  B[l], 

do ( J,  2,  v_length (A) ,  %  from  2  to  length  of  A 

(G [ J]  *«  C[J-1]  /  E[J-1],  %  induction 

E[J]  *“  B [ J]  -  A [ J]  *  G[J], 
y [ j]  *«  (R [ J]  -  A [ J]  *  V[J-1]))>, 

U[N]  *-  V[N], 

do (J,  (N-l),  1,  -1,  %  from  M  to  1  step  -1 

(U [ J]  (V[J]  -  G [ J+l]  *  UtJ+13)  *  H)). 

It  can  be  seen  that  the  use  of  the  new  notation  restores  the  clarity  of  the  algorithm. 


3.  Prolog  Language  Issues 

We  now  discuss  some  important  syntactic  and  semantic  issues  in  providing  a  clean  interface  between  the 
ANP  hardware  and  the  Prolog  language.  There  are  two  general  approaches  to  combine  numeric  computation  with 
Prolog:  i)  Include  a  second  language  which  allows  efficient  implementation  (but  with  procedural  semantics)  and  an 
external  language  interface  to  Prolog.  In  this  scheme  the  declarative  semantics  are  lost  for  the  system  as  a  whole. 
2)  Extend  Prolog  to  allow  numeric  computation.  With  care,  the  semantics  of  Prolog  will  be  retained  while  allowing 
an  efficient  implementation.  This  is  the  approach  we  take  in  this  design. 

Our  extension  to  the  language  is  guided  by  three  principles.  First,  it  must  allow  an  efficient  implementation. 
Second,  the  logical  semantics  of  Prolog  should  be  kept  And  third,  it  must  be  clean  for  the  application  programmer. 
Our  approach  has  two  facets:  1)  Introduce  new  scalar  and  vector  numeric  types  and  operations  which  are  then 
directly  supported  by  the  ANP,  and  extend  the  semantics  of  Prolog  primitives  for  the  new  types.  2)  Introduce  a  sim¬ 
ple  but  powerful  macro  facility  and  a  data  typing  scheme  to  allow  concise  scientific  programming.  We  describe  a 
set  of  suggested  macros  and  built-in  predicates  which  allow  this  while  retaining  the  logical  semantics. 


3.1.  The  Macro  Facility 

Our  macro  facility  is  similar  to  that  of  SB-Prolog  [2],  The  scheme  used  in  ESP  [4]  was  rejected  because  of  its 
complexity.  The  SB-Prolog  macro  facility  will  expand  a  goal  inline  and  partially  evaluate  it.  This  simple  idea  is 
quite  powerful.  It  will  suffice  to  implement  our  ideas  if  the  partial  evaluator  is  general  enough,  and  if  assert  is 
given  the  proper  interpretation. 

3.1.1.  Assignment  and  User-defined  Functions 

In  order  to  denote  array  assignment  and  user-defined  functions  in  a  concise  manner  we  introduce  the  .  =  and 
*-  macros.  These  expand  array  references  into  explicit  calls  of  new  built-ins  which  access  the  areay  elements. 
They  also  expand  function  calls  into  goals  with  an  additional  argument  that  will  contain  the  function  value.  The 
*-  macro  unifies  its  arguments  (thus  keeping  the  logical  semantics),  while  the  :«  macro  destructively  assigns 
(with  restoration  of  the  value  on  backtracking).  For  example,  the  macro  call  A  [  3  ]  :  -f  (X)  will  be  expanded  into 
the  two  goals  f  (X,T),  rplacarg  (3,  A,  T)  .  Part  of  the  definition  of  is. 
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(L  ;=  R)  right_eval(L,  X),  left_eval(R,  X). 

Manx  X)  (number  (X);  var(X)),  !• 

right"eval(Aref,  X)  array_ref (Aref ) , 

Aref  [Aname 1 Args ] , 

eval_index (Args,  Index), 

aref (Aname,  Index,  X).  similar  clauses  for  other  operations 

right_eval (A+B,  X) 

right _ eval (A,  VA) , 

right_eval (B,  VB) , 

function_call (Fund , 

Func  " • •  (Fname |Args] , 
append (Args,  IX],  AllArgs) , 

Caii  [Fname 1 AllArgs ] , 

call (Call)  . 

The  **  macro  is  similarly  defined. 

3.1.2.  Denoting  Iteration 

We  wages,  a  do  macro  to  denote  iteration  in  a  clean  way  similar  to  a  Fortran  DO  loop: 

doTlnd.nn.ma,  St.ro_ind.tt,  End.index,  option.i_ancrem.nt,  (Body,, 

All  objects  mentioned  in  the  body  of  die  *  ^  ^  ~ 

e  This  macro  allows  the  use  of  destructive  assignment  m  the  body.  However,  it  au  as  s 

7  then  the  logical  semantics  are  kept.  The  macro  expands  into  a  recursive  predicate  of  the  following  orm 

which  is  put  in  the  global  data  base:  E  d 

_  .  BndvVars  .  .  )  Index_name>End. 

x_do(Index_name,  End,  •  • Index  name-<End, 
x_do(Index_name,  End,  . . .BodyVars . . .  > 

Ne^iidex  is  Index_name+Optional_increment , 

x_do (New_index,  End,  . . .BodyVars ... 

The  inline  code  7s  a  settee  of  goals  im.htha.ng  the  body's  variables  (as  denoied  by  BodyV.re  ),  followed  by 
the  call  x_do  (Start_index,  End_index,  .  .  .BodyVars  .  .  . )  • 

3  2.  Backtracking  Semantics 

Ideally  dte  semantics  of  dre  numertc  opera., ons  would  hi  into  pure  Prolog,  wid.  stngle  assignment  vectors 
and  restoration  on  backing.  Thts  can  be  achieved  sememes,  for  enampie  ^  ^ 

used  in  dm  didag  sample.  We  do  no,  presendy  see  how  i,  can  be  achteved  in  general  whde  keeping  dte  h  g 
glance.  As  a  comprom.se,  we  w,U  present  a  design  wbicb  achieves  efheiency  with  no  greaiet  haon  -  -  *• 

ical  semantics  than  the  var  predicate. 

proiog  togeiher  wid.  a  backtrackable  destntcuve  assignmen,  (which  we  call  rplacarp)  is  no i  less  logical 

than  Prolog  with  var  because  rplacar,  can  be  tapiemenmd  with  var  (nlbei,  ^ 
rnlacarg  is  implemenied  thready  in  dte  underlying  archtleciure  i,  can  caecum  ,n  constant  ume.  W  conyec  m 
dta,  Prolog  with  this  implementauon  of  rplacarg  can  achteve  dte  same  ume  bound  as  a  proceduml  languag 

any  problem. 
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A  block  of  floating  point  operations  is  implemented  by  loading  the  ANP  registers  (see  below)  from  the  heap, 
doing  the  calculations,  and  finally  storing  the  results.  Trailing  of  the  ANP  registers  is  never  done,  only  the  heap  is 
trailed. 

As  a  result  of  the  above  reasoning,  we  require  that  vectors  must  be  restored  on  backtracking  just  like  other 
Prolog  terms.  Destructive  assignment  is  allowed  as  long  as  the  old  value  can  be  restored.  There  are  two  methods  to 
achieve  this.  The  first  way  is  to  trail  all  floating  point  stores  to  the  heap.  Note  that  loads  and  numeric  operations  do 
not  need  to  be  trailed.  The  second  way  is  to  trail  before  the  first  assignment  after  choice  point  creations,  and  then 
trail  only  those  vectors  which  will  be  changed.  The  choice  of  which  of  these  methods  to  use  is  up  to  the  compiler. 
For  efficiency  it  will  attempt  to  keep  all  trail  checking  out  of  the  inner  loops.  One  possible  optimization  is  to  recog¬ 
nize  that  if  multiple  assignments  are  done  between  choice  point  creations  then  only  the  first  needs  to  be  trailed. 

3.3.  Exception  Handling 

Floating  point  exceptions  are  handled  by  means  of  failure.  The  existing  failure  mechanism  in  Prolog  is 
already  set  up  to  handle  state  restoration  and  continued  execution.  In  order  to  use  this  we  propose  the  addition  of 
two  global  facts  which  are  always  accessible  to  the  program:  except ion_enable  (ExceptionList) , 
except ion_occurrence  (Exception)  where  except ion_enable  is  given  a  list  of  flags  telling  what 
exception  condition(s)  will  cause  failure,  and  except  ion_occurrence  will  unify  with  the  last  exception 
which  has  actually  caused  failure.  The  programmer  is  not  obliged  to  use  these  two  facts  as  long  as  he  realizes  what 
the  cause  of  a  failure  is.  We  implement  and  support  the  standard  IEEE  exceptions  and  proposed  handling  scheme 
[1, 15]  and  some  ANP-specific  exceptions  (such  as  bounds  violation)  in  this  design.  However,  we  do  not  support 
user-defined  exceptions. 

As  an  example  of  the  use  of  these  predicates,  consider  the  following  numeric  code  which  can  fail  both 
through  an  exception  and  through  design  (i.e.  choosing  an  alternate  algorithm  if  the  first  one  is  inadequate): 

routine  (... )  algorithm!.,  !.  %  1st  algorithm 

routine!...)  exception_occurrence (none) ,  algorithm2,  !.  %  2nd  algorithm 

routine (... )  exception__occurrence (E) ,  not(E=none),  except ion_handler . 

The  nesting  of  exception  handlers  is  provided  naturally  through  the  failure  mechanism.  A  failure  caused  by  an 
exception  will  restore  the  vectors  at  the  most  recent  choice  point,  and  go  to  the  next  clause,  which  could  contain  an 

exception  handler.  Consider  this  example: 

Sequence  of  goals:  .  ..,  algo ( . . . )  ,  ... 

“execution  continues  here 

algo ( . . . )  codel . . . <exception  occurs  here> . . . code2 . 
algo ( . . . )  exception_handler . 

If  the  handler  succeeds  then  execution  continues  at  the  deepest  goal  containing  the  exception  which  has  created  a 
choice  point,  in  this  case  algo .  It  does  not  continue  at  the  exact  point  of  occurrence  of  the  exception.  If  the 
handler  is  not  able  to  continue  then  it  also  will  fail  and  execution  will  continue  at  the  next  higher  handler  in  the 
hierarchy.  In  the  example  this  happens  when  the  call  algo  (...)  fails. 
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The  addition  of  two  global  facts  which  change  during  execution  harms  the  logical  semantics.  We  feel  quite 
strongly  that  this  should  be  rectified  but  we  have  not  yet  been  able  to  invent  a  satisfactory  solution  that  is  sufficiently 
efficient.  Thus  we  merely  make  visible  the  hardware  exception  registers  to  the  Prolog  programmer  [10, 14]. 

4.  Machine  Programming  Model 

Because  the  Aquarius  Numeric  Processor  (ANP)  is  a  coprocessor  to  the  Programmed  Logic  Machine  (PLM), 
it  inherits  the  data  types  and  programming  model  from  the  PLM  [8,9].  It  adds  new  data  types  to  the  programming 
model  including,  in  both  scalar  and  vector  forms,  integer,  single  and  double  precision  floating  point  numbers  in 
tfff  standard  (754)  form  [1].  An  extended  numeric  register  set  and  a  large  repertoire  of  integer  and  floating  point 

operations  are  provided  for  these  new  data  types. 

4.1.  Representation  of  the  Numeric  Data  Types 

Data  in  the  PLM  programming  model  is  represented  by  32-bit  tagged  words.  There  are  four  primary  types, 
list,  structure,  variable  and  constant,  which  are  distinguished  by  bit<31:30>.  These  are  shown  in  figure  2.  Bit<29> 
is  a  cdr  bit  which  is  used  for  compact  list  representation,  and  bit<28>  is  a  garbage  collection  bit.  This  bit  is  reserved 
for  data  marking  during  garbage  collection.  Bit<27:26>  of  a  constant  data  type  further  differentiate  between  a  26- 
bit  small  integer  (00),  other-numeric  header  (01).  an  atom  (10)  and  a  nil  (11).  This  tagging  information  allows 
efficient  manipulation  of  data  by  applying  different  strategies  to  operate  on  each  class  of  data.  Although  data  typing 
benefits  from  efficient  execution,  it  decreases  the  amount  of  information  that  can  be  stored  within  each  data  word. 

Several  new  data  types  are  added  to  the  ANP  for  numeric  computations.  The  fundamental  numeric  data  types 
are  32-  and  64-bit  integer  and  single  and  double  precision  floating  point  numbers.  Arrays  based  on  these  fundamen¬ 
tal  data  types  can  be  constructed  in  single  and  multi-dimensional  forms.  Integers  and  floating  point  numbers  for 
computation  in  the  ANP  conforms  to  the  IEEE  Standard  P754  [1], 

4.1.1.  Structure  Numeric  Representation 

The  IEEE  Standard  for  binary  floating-point  specifies  numeric  operands  to  be  a  multiple  of  a  32-bit  word 
except  for  the  recommended  extended  format,  which  is  80-bits  long.  To  maintain  compatibility  with  this  standard 
as  well  as  the  PLM  execution  model,  an  additional  32-bit  word  is  needed  to  store  data  type  information.  The  Struc¬ 
ture  Numeric  Representation  (SNR),  figure  2)  utilizes  a  structure  pointer  to  the  numeric  operand  it  is  representing. 
The  structure  pointer  has  a  28-bit  address  pointing  to  the  location  of  the  numeric  operand  on  the  heap.  The  first 
entry  of  the  numeric  operand  is  a  header  which  has  a  constant  primary  tag,  garbage  collection  and  cdr  bits,  an 
other-numeric  secondary  tag  (bit<28:27>  =  01),  four  bits  of  numeric  tags  and  a  16-bit  vector  length.  The  numeric 
tags  specify  the  extended  data  types  which  include  vector/scalar  (V),  double-/single-precision  (D),  floating- 
point/integer  (F)  and  unsigned/signed  (S)  of  the  operand.  Tag  space  is  also  provided  for  additional  numeric  types 
such  as  infinite  precision  integers,  (multi-words)  bit  vectors,  decimal,  and  complex  numbers  that  may  be  added  in 
the  future.  Since  the  PLM  data  path  cannot  directly  operate  on  the  32-bit  numeric  operands,  the  entire  numeric 
structure  addressed  by  an  indirect  pointer  will  not  be  transferred  into  the  PLM  register  set,  but  into  the  ANP  instead. 
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This  coding  scheme  is  compare  wiih  IEEE  standard  a,  d*  expense  of  less  atom  esecuiion  and  more  memorj, 
storage  for  numeric  operands. 

42.  Dynamic  Operand  Coercion 

Many  numeric  operations  generally  appear  in  the  instruction  sets  of  scientific  processors.  Often  a  subset  of 
equivalent  scalar  opcodes  appear  in  vectorized  forms  as  well.  Normally  the  programmer  (or  the  compiler)  chooses 
the  correct  opcodes  for  the  data  types  used  in  each  program.  For  general  programs,  code  for  testing  the  input  dam 
types  must  be  added  to  accommodate  the  dynamic  nature  of  the  input.  There  are  two  undesired  srde-effects  m  this 
method:  1)  The  extra  code  increases  the  size  of  the  program,  thus  increasing  the  demand  on  a  generally  critical  sys¬ 
tem  resource,  input/output  to  main  memory.  2)  The  added  test  and  branch  opcodes  decrease  the  efficiency  in  the 
processor’s  (pre-)fetching  mechanism.  The  second  side-effect  is  greatly  magnified  in  a  vector  processing  system  in 
which  the  functional  units  are  pipelined.  We  thus  choose  to  support  Dynamic  Operand  Coercion  (DOC)  [15).  Pro¬ 
grammers  can  describe  the  numeric  operations  that  are  required  to  accomplish  a  goal  without  considerauon  of  the 
input  data  types  involved.  The  ANP  will  do  dynamic  type  checking  and  coerce  the  arguments  if  necessary.  The 
implementation  is  such  that  there  is  no  overhead  when  no  coercion  is  done  (i.e.  if  the  types  are  identical). 
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Figure  2:  PLM  Data  tagging  representation 


Figure  3:  Structure  Numeric  Representation 
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43  Extended  Numeric  Register  Set 

'  The  architecture  of  the  ANP  adds  a  number  of  data  and  stale  registers  to  (he  P134  programming  modeh  o^B?) 
m  „„  addiliolul  eight  general  propose  dara  registers  dan  can  be  configurad  for  scalar  (Fo  -  F,)  or  vecror  <*  - 
sro^e  In  addition  to  these  eigh,  data  registers,  .here  are  40  scalar  r=gis.ers  (F.  -  Far).  16  scrarch  pad  regrs  er 
reserved  for  inremal  use  (Fa.  -  F«).  128  predefined  constants.  and  two  control  and  status  regulars  or  system 
"  Each  data  regrs, er  can  store  a  32-bil  integer  or  a  smgle  or  double  precision  fioaung  pom.  number.  The  data 
type  and  vector  length  of  e*h  vector  is  stored  in  rhe  cotresponding  32-bit  header  regisrer  («.)•  Fr^rre  shows 

combined  ANP/PLM  register  set. 

There  am  two  registers  in  the  AMP  for  suras  and  conrrol  cotnmunicadon  between i  the  ANP  |he  PLM  and  the 
memory  unit.  System  parameter  can  be  written  ro  me  control  regtster  (CR,  for  —on  of  me  ANP. 
and  flags  can  be  read  from  the  status  register  (SR)  for  debugging. 


4.4.  The  ANP  Instruction  Set 

Data  movement  tnsaucuons  provtde  a  means  to  load  or  sure  programmer  visible  registers  in  the  ANP. 
Address  calculation  for  these  instructions  is  done  in  me  PLM.  The  PLM  fetches  me  numeric  dam  structure  from 
heap  addressed  by  an  A.  register  or  wn.es  dam  provided  by  me  ANP  ,0  me  top  of  me  heap. 

FMOVxx  instructions  move  dam  between  an  element  of  a  B,  and  a  F,  tegister.  This  allows  effluent  access 
m  individual  elements  of  an  array.  FMOVL/FMOVS  uses  an  8-bit  immediate  tndex  for  f>,  access. 
FMOVLF/FMOVSF  uses  me  modulo  256  of  a  F,  register  value  as  an  index  for  6,  access. 


Figure  4:  Extended  PLM/ ANP  register  set 
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Table  2:  The  ANP  Instruction  Set  - _ 

Data  Movement 

Arithmetic 

Logical 

Bit 

Conversion 

FUO  AD 

ADD 

NAND 

LS 

USP 

FLOAD  CR 

ADDA 

andnx 

AS 

UDP 

FSTORE 

SUB 

ANDNY 

ROT 

ISP 

FSTORE  SR 

SUBA 

AND 

IDP 

FMOVL  INDEX 

SUBX 

ORNX 

SPU 

fmovlf  INDEX 

SUBXA 

ORNY 

SPI 

FMOVS  INDEX 

MULT 

OR 

SPDP 

FMOVSF  INDEX 

MULTA 

NOR 

DPU 

FMOVE 

DIV 

XNOR 

DPI 

FLOAD_MINDEX 

XOR 

DPSP 

FLOADJMINDEX 

NOTX 

FSTOR£_MINDEX 

NOTY 

FSTORE  IMINDEX 

PASSX 

PASSY 

SET 

CLR 

Monadic 

Compare 

Compound 

j  Misc.  ! 

ABS 

CMP 

SQRT 

CLF 

NEC 

MAX 

MAC 

NOP 

MIN 

SMAC 

MACS 

AB Solute  and  NEGate  operations  can  be  applied  to  all  numeric  data  types  in  scalar  and  vector  forms.  The 
dyadic  instructions  include  several  classes  of  numeric  functions.  Arithmetic  functions  such  as  add,  subtract,  multi¬ 
ply  and  divide  are  supported  in  both  normal  form  and  absolute  forms  (e.g.  Y  is  !A  +  Bl).  A  full  set  of  logical  and  bit 
manipulation  operations  are  included  for  signed  and  unsigned  integer  data  types.  Shift  and  rotate  mstrucuons  accept 
a  shift  count  as  an  immediate  value  or  from  a  numeric  register. 

Comparison  instructions  are  used  to  test  conditions  for  branching  instructions  and  to  select  the  maximum  or 
minimum  value  from  a  set  of  numbers.  Compound  instructions  are  microcoded  sequences  of  the  basic  operations. 
For  example,  MAC/SMAC/MACS  calculates  the  inner  product  of  two  vectors.  Conversion  mstrucuons  provide  a 

means  to  change  between  data  formats. 


5.  ANP  Architecture 

The  purpose  of  the  ANP  is  to  supplement  the  PLM  symbolic  processor  with  high  performance  numeric  opera¬ 
tions  while  maintaining  upward  compatibility  with  the  existing  PLM’s  Instruction  Set  Architecture.  This  is  accom¬ 
plished  with  an  extension  of  numeric  data  types  and  instructions,  as  described  in  the  previous  sections,  and  an  archi¬ 
tecture  that  efficiently  supports  these  new  extensions.  The  ANP  functions  as  a  slave  coprocessor  to  the  PLM.  The 
programmer  perceives  the  PLM/ANP  execution  model  as  if  all  numeric  instructions  are  executed  in  the  PLM.  In 
systems  where  an  ANP  is  not  present,  numeric  operations  are  emulated  in  software  via  traps  to  the  host  processor. 

A  Private  Memory  Bus  (PMB)  connects  the  PLM  to  its  memory  system.  The  ANP  utilizes  the  PMB  to  pro¬ 
vide  a  logical  extension  of  the  PLM  registers  and  instructions  in  a  manner  which  is  transparent  to  the  programmer. 
The  ANP  consists  of  five  independent  functional  units  operating  concurrently  to  achieve  high  performance  in 
numeric  computations  [14, 15],  The  Bus  Interface  Unit  (BIU)  is  responsible  for  all  communications  between  the 
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ANP  PLM  and  memory  system.  The  Operand  Coereio.  Unit  (OCU)  provides  operand  <yp«  chectang,  eoercon 
(DOC)  and  vector  length  management  for  the  Erection  Unit.  The  Storage  Unit  (SU)  consults  of  64  header  reg.s- 
m  64  aeatar  registers,  eight  256s=lement  vector  registers,  and  128  predefined  constants.  The  Execution  Uni,  (EU) 
coniains  fite  dam  path  for  integer  and  floating  point  operations.  The  hearf  of  the  ANP  is  a  Micro  Control  Urn, 
(MCU)  which  consists  of  a  microprogram  sequencer,  a  96-bi.  horizontal  writable  control  snare,  and  other  oratory 
dm.  handles  exception  processing  and  initialization  of  microcode.  A  block  diagram  of  the  ANP  is  shown  m  figure  5. 


6.  Performance  Measurements 

Evaluation  of  the  ANP  is  done  in  two  steps.  First,  a  register  transfer  level  simulator  provides  a  means  for  the 
evaluation  of  the  microarchitecture  of  the  ANP.  Second,  a  hardware  implementation  will  be  constntcted  and  tested 
wifi,  calculations  that  are  too  large  to  be  simulated.  Preliminary  performance  measurements  were  obtamed  from 
simulation  of  the  design  using  a  set  of  benchmark  programs  written  in  Prolog. 

6.1.  Measurement  Results 

A  se,  of  Prolog  programs  uanslated  from  selected  (double  precision)  Whetstone  benchmark  modules  15]  is 
used  to  verify  the  correctness  and  measure  the  performance  of  dm  PLM/ANP  system.  The  second  of  the  Whetstone 
programs  W  is  shown  below  to  iUusuate  the  style  of  some  selected  Whetstone  benchmark  modules  wnttcn  to 

Prolog  with  our  new  array  notations. 


Figure  5:  ANP  Simplified  Block  Diagram 
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wh2 (0,  A,_,  A)  !. 


wh2(N,E,T,Y) 

A [ 0 ]  {  E[0] 

+  E[l] 

+ 

E  [2 ] 

— 

E  [3]  ) 

* 

T, 

A [ 1 ]  (  A [ 0 ] 

+  E  [1] 

- 

E  [2  ] 

+ 

E  [33 ) 

★ 

T, 

A[2]  (  A [ 0 ] 

-  All] 

+ 

E  [2] 

+ 

E  [33 ) 

★ 

T, 

A[3]  <-A[0] 

-  A[l] 

+ 

A  [  2  ] 

+ 

E[33) 

★ 

T, 

M  is  N  -  1, 
wh2  (M,  A,  T,  Y)  . 

Table  3  shows  the  measurements  obtained  from  simulation  of  the  ANP  architecture.  The  second  column 
shows  the  number  of  floating  point  operations  (flop)  in  one  iteration  of  the  corresponding  benchmark.  Columns 
three  to  six  show  the  variation  in  mega-flops  (MFLOPS)  when  each  benchmark  is  run  for  one  hundred,  one 
thousand,  ten  thousand  and  one  hundred  thousand  iterations. 


Table  3:  Simulated  Benchmark  Performance  (units  in  MFLOPS) 

test 

flop 

iterations  (double  precision  calculations) 
100  IK  10K  100K 

comments 

whl 

16 

4.36 

4.55 

4.57 

4.57 

simple  identifier 

wh2 

16 

4.42 

4.56 

4.57 

4.57 

array  element 

wh3 

102 

3.42 

3.43 

3.43 

3.43 

array  as  parameter 

wh6 

15 

3.38 

3.48 

3.49 

3.49 

integer  arithmetic 

wh8 

7 

2.32 

2.40 

2.41 

2.41 

procedure  call 

whlO 

5 

3.64 

3.83 

3.84 

3.85 

integer  arithmetic 

mac 

511 

16.08 

18.24 

18.25 

18.25 

inner  product 

7.  Conclusions 

Our  preliminary  results  are  encouraging.  The  language  constructs  we  developed  allow  clean,  compact  arid 
easy  to  understand  numeric  programs.  They  also  have  semantics  within  kernel  Prolog  and  efficient  mappings  into 
the  specialized  hardware  of  the  ANP/PLM  system.  Simulation  results  indicate  a  performance  of  4  MFLOPS  (in 
double  precision)  on  selected  modules  of  the  Whetstone  benchmark  written  in  Prolog.  Thus  numeric  performance  is 
reasonably  well  matched  with  the  PLM  symbolic  performance. 
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